Run these two commands and restart the notebook. (one time only)
jupyter nbextension enable --py --sys-prefix qgrid
jupyter nbextension enable --py --sys-prefix widgetsnbextension

The formula we use is essentially raw score * (min(weight total, 10) / weight total), where 10 can be changed around for different results.

We will use the user score when deciding the weight a target show should get as well. This can also be changed, but the default will be that the target gets multiplied by (User Score - 5).

Can adjust balancing and shifting_avg, as well as genre list for the inputs/sources.

TARGET NODES

Every section after this was imported from another notebook, and is not as applicable / optional. This section takes the combined target weights from all of the user's watched shows.

The following tables will let you sort by any column. They both are useful in determining different things, while being general indicators of how much a show should be recommended to you based on your watch list. If you are unsure of which to use, go with User Weight.

Score Diff / Score Diff z-score is good for finding shows that you would tend to rate higher than others (it's based on the difference between the score you give to the show and the show's average).

Looking at shows we've watched only (some equivalent to a training set)

Looking at other shows (somewhat equivalent to a test set)
















OTHER

Convert to NetworkX graph

2. Random graph stuff

Copied from website

Some basic visualizations

6. Import Bokeh

5.1 Random examples from documentation

Centrality

Print the first 5 highest scoring nodes for each of the centrality measure. In each of these measures we see more or less the same names popping up, but not in the same order.

9. Directed Graph Creation w/ HITS attribute (from undirected)

Converting from an undirected graph to a directed one in BERT's CSV:

Hub/authority are the same because the graph is undirected. When comparing the two nodes for an edge,

9.2.1 Creating the directed graph

Tables needed from before:

Dictionary with files as keys and hub/authority as values

Modifying df

Now that we have hub/authority values in the same table as edgelist, we can build a directed graph manually.

Bokeh example

11. K-Means Clustering

K-means is considered by many to be the gold standard when it comes to clustering due to its simplicity and performance, so it's the first one we'll try out.

When you have no idea at all what algorithm to use, K-means is usually the first choice.

K-means works by defining spherical clusters that are separable in a way so that the mean value converges towards the cluster center. Because of this, K-Means may underperform sometimes.

11.1 Clustering, Connectivity and other Graph properties using Networkx

https://www.geeksforgeeks.org/python-clustering-connectivity-and-other-graph-properties-using-networkx/

Returns a Dictionary with clustering value of each node

Get the Clustering value for the whole Graph (0 - 1)

Transitivity of a Graph = 3 * Number of triangles in a Graph / Number of connected triads in the Graph.

In other words, it is the fraction of all possible triangles present in G

also (0-1)

Now, we know that the graph is connected. Networkx provides a number of in built functions to check on the various Connectivity features of a Graph. If it were not connected, there would be many useful functions..

Few important characteristics of a Graph

11.2 Clustering Setup

https://www.learndatasci.com/tutorials/k-means-clustering-algorithms-python-intro/

For small / moderate graphs, one simple option is to store the node names as a list:

Added node_id and neighbor_id to find index of a text because the indices must be in integers, not strings.

Inspired: https://stackoverflow.com/questions/25160191/mapping-from-a-nodes-name-to-its-index-and-vice-versa-in-networkx , might not work well with huge graphs though.

11.3 Fitting K-Means & Other Clustering Models

Agglomerative Clustering

The main idea behind Agglomerative clustering is that each node first starts in its own cluster, and then pairs of clusters recursively merge together in a way that minimally increases a given linkage distance.

Spectral Clustering

Affinity Propagation

Affinity propagation performs really well on several computer vision and biology problems, such as clustering pictures of human faces and identifying regulated transcripts

Fitting each model:

11.4 Metrics & Plotting

Requires y_true

11.5 Visualization

Output for K-means clustering

12. Node2Vec with K-Means Clustering

Combining the work of sections 7, 8, and 11.

Need to turn clustering labels into dictionary with nodes as keys.

Copied from part 8

Value of embedding is for instance

NetworkX to GEXF (Gephi)